Skip to content

[DO NOT MERGE] 实现与PaddleFormers对齐的Qwen2#11189

Draft
JunnYu wants to merge 1 commit intoPaddlePaddle:developfrom
JunnYu:precision_check_qwen2.5
Draft

[DO NOT MERGE] 实现与PaddleFormers对齐的Qwen2#11189
JunnYu wants to merge 1 commit intoPaddlePaddle:developfrom
JunnYu:precision_check_qwen2.5

Conversation

@JunnYu
Copy link
Copy Markdown
Member

@JunnYu JunnYu commented Dec 9, 2025

Before submitting

  • Lint code. If there are lint issues, please format the code first.
# Install and register `pre-commit` in the project folder
pip install pre-commit && pre-commit install

# Process previous code files separately
pre-commit run --file XXXX.py
  • Add test cases into tests folder. If there are codecov issues, please add tests cases first.

PR types

PR changes

Description

为了对齐PaddleFormers的Qwen2.5,特此进行了修改。

  • 对齐rope旋转位置编码精度,提升精度采用FP32进行计算。
  • 对齐fuse配置逻辑,开启fuse_swiglu, 开启 fuse_rms_norm
  • 对齐fuse rms norm的实现,均采用框架实现(关闭fast math编译)
  • 对齐LM Head的权重shape
    • paddleformers的实现:logits = paddle.matmul(x, weight, transpose_y=True) weight的shape[vocab_size, hidden_size]
    • paddlenlp的实现:
      • logits = paddle.matmul(x, weight, transpose_y=False) weight的shape[hidden_size, vocab_size]
      • [实现1] 如果将 logits = paddle.matmul(x, weight.t(), transpose_y=True) weight的shape[hidden_size, vocab_size] 这样实现,前向精度能保持一致,但是反向的时候在训练过程中如果开启main_grad后,梯度不一致。
      • [实现2] 因此,必须修改PaddleNLP的组网,使得paddlenlp的weight的shape与paddleformers要一模一样!

@paddle-bot
Copy link
Copy Markdown

paddle-bot bot commented Dec 9, 2025

Thanks for your contribution!

@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 8, 2026

This Pull Request is stale because it has been open for 60 days with no activity. 当前Pull Request 60天内无活动,被标记为stale。

@github-actions github-actions bot added the stale label Feb 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant